Learning Visual Context for Group Activity Recognition
نویسندگان
چکیده
Group activity recognition aims to recognize an overall in a multi-person scene. Previous methods strive reason on individual features. However, they under-explore the person-specific contextual information, which is significant and informative computer vision tasks. In this paper, we propose new reasoning paradigm incorporate global information. Specifically, two modules bridge gap between group visual context. The first Transformer based Context Encoding (TCE) module, enhances representation by encoding information features refining aggregated second Spatial-Temporal Bilinear Pooling (STBiP) module. It firstly further explores pairwise relationships for context encoded representation, then generates semantic representations via gated message passing constructed spatial-temporal graph. On their basis, design two-branch model that integrates designed into pipeline. Systematic experiments demonstrate each module's effectiveness either branch. Visualizations indicate cues can be globally TCE. Moreover, our method achieves state-of-the-art results widely used benchmarks using only RGB images as input 2D backbones.
منابع مشابه
Learning Temporal Context for Activity Recognition
Abstract. We present a method that allows to improve activity recognition using temporal and spatial context. We investigate how incremental learning of long-term human activity patterns improves the accuracy of activity classification over time. Two datasets collected over several months containing hand-annotated activity in residential and office environments were chosen to evaluate the appro...
متن کاملMachine learning based Visual Evoked Potential (VEP) Signals Recognition
Introduction: Visual evoked potentials contain certain diagnostic information which have proved to be of importance in the visual systems functional integrity. Due to substantial decrease of amplitude in extra macular stimulation in commonly used pattern VEPs, differentiating normal and abnormal signals can prove to be quite an obstacle. Due to developments of use of machine l...
متن کاملIterative context compilation for visual object recognition
This contribution describes an almost parameterless iterative context compilation method, which produces feature layers, that are especially suited for mixed bottom-up top-down association architectures. The context model is simple and enables fast calculation. Resulting structures are invariant to position, scale and rotation of input patterns.
متن کاملDialogue Context for Visual Feedback Recognition
Head pose and gesture offer several key conversational grounding cues and are used extensively in face-to-face interaction among people. When recognizing visual feedback, people use more than their visual perception. Knowledge about the current topic and expectations from previous utterances help guide our visual perception in recognizing nonverbal cues. In this chapter, we investigate how dial...
متن کاملVisual Learning for Landmark Recognition
Recognizing landmark is a critical task for mobile robots. Landmarks are used for robot positioning, and for building maps of unknown environments. In this context, the traditional recognition techniques based on strong geometric models cannot be used. Rather, models of landmarks must be built from observations using image-based visual learning techniques. Beyond its application to mobile robot...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i4.16437